This post will provide an example using the configuration integral for the canonical ensemble in statistical mechanics. I want to evaluate the integral using various methods, starting with grid-based integration methods. Using a grid-based method suffers from the curse of dimensionality, where adding particles raises the time to compute exponentially (which is why Monte Carlo methods are normally used). However, it can still be useful to check Monte Carlo codes with other integration methods for small particle numbers. Also, the free energy (and all associated quantities) is computed directly, where it is more difficult to compute with MC methods. Finally, I'm curious as to how large of a system could be computed on today's hardware.
Sympy is the computer algebra system I'm using. The code for this example is available on github in the derivation_modeling branch, in the prototype subdirectory.
The output (MathML and generated python) can be seen here.
The flow is as follows:
- Derive the composite trapezoidal rule, starting from the expression for a single interval
- Generate code for the composite trapezoidal rule
- Start from the configuration integral (which I've called the partition function in the documents) and transform it by specializing various parameters - 2 particles in 2 spatial dimensions, using the Lennard-Jones interaction potential, etc - until it becomes completely specified numerically.
- Generate code that evaluates this integral using the generated integration code from step 2
There's still a tremendous amount of work to do on this, but hopefully an outline of this style of programming is visible from the example.
- The MathML pages have a lot of detailed small steps - it would be nice to have a collapsible hierarchy so all the little steps can be hidden.
- Some abstractions should remain during the steps - in particular the potential. The expression for the integrand can get quite messy from explicitly inserting the expression for the potential (and this is a simple case - more complex potentials would make it unreadable).
- More backends for the code generation - C/C++ is the obvious choice in pursuit of higher performance. (The generated python can be run through Shed Skin, for another route to C++). Converting to Theano expressions might be a quick way to try it out on the GPU.